{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 大连理工大学中文情感词汇\n",
    "\n",
    "### 1. 介绍\n",
    "中文情感词汇本体库是大连理工大学信息检索研究室在林鸿飞教授的指导下经过全体教研室成员的努力整理和标注的一个中文本体资源。该资源从不同角度描述一个中文词汇或者短语，包括词语词性种类、情感类别、情感强度及极性等信息。\n",
    "\n",
    "中文情感词汇本体的情感分类体系是在国外比较有影响的Ekman的6大类情感分类体系的基础上构建的。在Ekman的基础上，词汇本体加入情感类别“好”对褒义情感进行了更细致的划分。最终词汇本体中的情感共分为7大类21小类。\n",
    "构造该资源的宗旨是在情感计算领域，为中文文本情感分析和倾向性分析提供一个便捷可靠的辅助手段。中文情感词汇本体可以用于解决多类别情感分类的问题，同时也可以用于解决一般的倾向性分析的问题。\n",
    "\n",
    "其中，一个情感词可能对应多个情感，情感分类用于刻画情感词的主要情感分类，辅助情感为该情感词在具有主要情感分类的同时含有的其他情感分类。\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "http://ir.dlut.edu.cn/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    " ### 2. 情感词汇本体格式\n",
    " \n",
    " | *词语*   | *词性种类* | *词义数* | *词义序号* | *情感分类* | *强度* | *极性* | *辅助情感分类* | *强度* | *极性* |\n",
    "| -------- | ---------- | -------- | ---------- | ---------- | ------ | ------ | -------------- | ------ | ------ |\n",
    "| 无所畏惧 | idiom      | 1        | 1          | PH         | 7      | 1      |                |        |        |\n",
    "| 手头紧   | idiom      | 1        | 1          | NE         | 7      | 0      |                |        |        |\n",
    "| 周到     | adj        | 1        | 1          | PH         | 5      | 1      |                |        |        |\n",
    "| 言过其实 | idiom      | 1        | 1          | NN         | 5      | 2      |                |        |        |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### 3.  情感分类及情感强度\n",
    "\n",
    "情感分类按照论文《情感词汇本体的构造》所述，情感分为7大类21小类。\n",
    "情感强度分为1,3,5,7,9五档，9表示强度最大，1为强度最小。\n",
    "\n",
    "\n",
    "\n",
    "| 编号 | 情感大类 | 情感类   | 例词                           |\n",
    "| ---- | -------- | -------- | ------------------------------ |\n",
    "| 1    | 乐       | 快乐(PA) | 喜悦、欢喜、笑眯眯、欢天喜地   |\n",
    "| 2    |          | 安心(PE) | 踏实、宽心、定心丸、问心无愧   |\n",
    "| 3    | 好       | 尊敬(PD) | 恭敬、敬爱、毕恭毕敬、肃然起敬 |\n",
    "| 4    |          | 赞扬(PH) | 英俊、优秀、通情达理、实事求是 |\n",
    "| 5    |          | 相信(PG) | 信任、信赖、可靠、毋庸置疑     |\n",
    "| 6    |          | 喜爱(PB) | 倾慕、宝贝、一见钟情、爱不释手 |\n",
    "| 7    |          | 祝愿(PK) | 渴望、保佑、福寿绵长、万寿无疆 |\n",
    "| 8    | 怒       | 愤怒(NA) | 气愤、恼火、大发雷霆、七窍生烟 |\n",
    "| 9    | 哀       | 悲伤(NB) | 忧伤、悲苦、心如刀割、悲痛欲绝 |\n",
    "| 10   |          | 失望(NJ) | 憾事、绝望、灰心丧气、心灰意冷 |\n",
    "| 11   |          | 疚(NH)   | 内疚、忏悔、过意不去、问心有愧 |\n",
    "| 12   |          | 思(PF)   | 思念、相思、牵肠挂肚、朝思暮想 |\n",
    "| 13   | 惧       | 慌(NI)   | 慌张、心慌、不知所措、手忙脚乱 |\n",
    "| 14   |          | 恐惧(NC) | 胆怯、害怕、担惊受怕、胆颤心惊 |\n",
    "| 15   |          | 羞(NG)   | 害羞、害臊、面红耳赤、无地自容 |\n",
    "| 16   | 恶       | 烦闷(NE) | 憋闷、烦躁、心烦意乱、自寻烦恼 |\n",
    "| 17   |          | 憎恶(ND) | 反感、可耻、恨之入骨、深恶痛绝 |\n",
    "| 18   |          | 贬责(NN) | 呆板、虚荣、杂乱无章、心狠手辣 |\n",
    "| 19   |          | 妒忌(NK) | 眼红、吃醋、醋坛子、嫉贤妒能   |\n",
    "| 20   |          | 怀疑(NL) | 多心、生疑、将信将疑、疑神疑鬼 |\n",
    "| 21   | 惊       | 惊奇(PC) | 奇怪、奇迹、大吃一惊、瞠目结舌 |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### 4.  词性种类\n",
    "\t情感词汇本体中的词性种类一共分为7类，分别是名词（noun），动词（verb），形容词（adj），副词（adv），网络词语（nw），成语（idiom），介词短语（prep）。\n",
    "### 5.  极性标注\n",
    "\t每个词在每一类情感下都对应了一个极性。其中，0代表中性，1代表褒义，2代表贬义，3代表兼有褒贬两性。\n",
    "\t注：褒贬标注时，通过词本身和情感共同确定，所以有些情感在一些词中可能极性1，而其他的词中有可能极性为0。\n",
    "### 6.  存储格式及规模\n",
    "\t中文情感本体以excel的格式进行存储，共含有情感词共计27466个，文件大小为1.22M。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:32:16.216885Z",
     "start_time": "2020-10-29T09:32:16.158513Z"
    }
   },
   "outputs": [],
   "source": [
    "pd.read_excel?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:11.444454Z",
     "start_time": "2020-10-29T09:34:08.909099Z"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>词语</th>\n",
       "      <th>词性种类</th>\n",
       "      <th>词义数</th>\n",
       "      <th>词义序号</th>\n",
       "      <th>情感分类</th>\n",
       "      <th>强度</th>\n",
       "      <th>极性</th>\n",
       "      <th>辅助情感分类</th>\n",
       "      <th>强度.1</th>\n",
       "      <th>极性.1</th>\n",
       "      <th>Unnamed: 10</th>\n",
       "      <th>Unnamed: 11</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>脏乱</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>糟报</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>早衰</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NE</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>责备</td>\n",
       "      <td>verb</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>贼眼</td>\n",
       "      <td>noun</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   词语  词性种类 词义数 词义序号 情感分类  强度  极性 辅助情感分类 强度.1 极性.1 Unnamed: 10 Unnamed: 11\n",
       "0  脏乱   adj   1    1   NN   7   2                                         \n",
       "1  糟报   adj   1    1   NN   5   2                                         \n",
       "2  早衰   adj   1    1   NE   5   2                                         \n",
       "3  责备  verb   1    1   NN   5   2                                         \n",
       "4  贼眼  noun   1    1   NN   5   2                                         "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_excel('./Textmining/情感词汇.xlsx', keep_default_na = False)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:13.657545Z",
     "start_time": "2020-10-29T09:34:13.653549Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(27466, 12)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:14.919721Z",
     "start_time": "2020-10-29T09:34:14.907185Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>词语</th>\n",
       "      <th>词性种类</th>\n",
       "      <th>词义数</th>\n",
       "      <th>词义序号</th>\n",
       "      <th>情感分类</th>\n",
       "      <th>强度</th>\n",
       "      <th>极性</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>脏乱</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>糟报</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>早衰</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NE</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>责备</td>\n",
       "      <td>verb</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>贼眼</td>\n",
       "      <td>noun</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NN</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   词语  词性种类 词义数 词义序号 情感分类  强度  极性\n",
       "0  脏乱   adj   1    1   NN   7   2\n",
       "1  糟报   adj   1    1   NN   5   2\n",
       "2  早衰   adj   1    1   NE   5   2\n",
       "3  责备  verb   1    1   NN   5   2\n",
       "4  贼眼  noun   1    1   NN   5   2"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df[['词语', '词性种类', '词义数', '词义序号', '情感分类', '强度', '极性']]\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:16.001639Z",
     "start_time": "2020-10-29T09:34:15.995918Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "词语       忿怒\n",
       "词性种类    adj\n",
       "词义数       1\n",
       "词义序号      1\n",
       "情感分类     NA\n",
       "强度        5\n",
       "极性        0\n",
       "Name: 565, dtype: object"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[565]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:24.387454Z",
     "start_time": "2020-10-29T09:34:24.373131Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>词语</th>\n",
       "      <th>词性种类</th>\n",
       "      <th>词义数</th>\n",
       "      <th>词义序号</th>\n",
       "      <th>情感分类</th>\n",
       "      <th>强度</th>\n",
       "      <th>极性</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>忿忿不平</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>198</th>\n",
       "      <td>怒火冲天</td>\n",
       "      <td>idiom</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>199</th>\n",
       "      <td>气愤愤</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>275</th>\n",
       "      <td>失落</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>560</th>\n",
       "      <td>愤懑</td>\n",
       "      <td>adj</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27262</th>\n",
       "      <td>盗憎主人</td>\n",
       "      <td>idiom</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27263</th>\n",
       "      <td>扑杀此獠</td>\n",
       "      <td>idiom</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27264</th>\n",
       "      <td>冤家路窄</td>\n",
       "      <td>idiom</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27361</th>\n",
       "      <td>脾气</td>\n",
       "      <td>noun</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>NA</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27401</th>\n",
       "      <td>太岁头上动土</td>\n",
       "      <td>idiom</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>NA</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>388 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           词语   词性种类 词义数 词义序号 情感分类  强度  极性\n",
       "190      忿忿不平    adj   1    1   NA   5   2\n",
       "198      怒火冲天  idiom   1    1   NA   7   0\n",
       "199       气愤愤    adj   1    1   NA   5   0\n",
       "275        失落    adj   1    1   NA   7   0\n",
       "560        愤懑    adj   1    1   NA   5   2\n",
       "...       ...    ...  ..  ...  ...  ..  ..\n",
       "27262    盗憎主人  idiom   1    1   NA   5   2\n",
       "27263    扑杀此獠  idiom   1    1   NA   8   2\n",
       "27264    冤家路窄  idiom   1    1   NA   8   2\n",
       "27361      脾气   noun   2    2   NA   5   2\n",
       "27401  太岁头上动土  idiom   1    1   NA   7   2\n",
       "\n",
       "[388 rows x 7 columns]"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['情感分类']=='NA']\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:33.538798Z",
     "start_time": "2020-10-29T09:34:29.159116Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "情绪词语列表整理完成\n"
     ]
    }
   ],
   "source": [
    "Happy = []\n",
    "Good = []\n",
    "Surprise = []\n",
    "Anger = []\n",
    "Sad = []\n",
    "Fear = []\n",
    "Disgust = []\n",
    "for idx, row in df.iterrows():\n",
    "    if row['情感分类'] in ['PA', 'PE']:\n",
    "        Happy.append(row['词语'])\n",
    "    if row['情感分类'] in ['PD', 'PH', 'PG', 'PB', 'PK']:\n",
    "        Good.append(row['词语']) \n",
    "    if row['情感分类'] in ['PC']:\n",
    "        Surprise.append(row['词语'])     \n",
    "    if row['情感分类'] in ['NA']:\n",
    "        Anger.append(row['词语'])    \n",
    "    if row['情感分类'] in ['NB', 'NJ', 'NH', 'PF']:\n",
    "        Sad.append(row['词语'])\n",
    "    if row['情感分类'] in ['NI', 'NC', 'NG']:\n",
    "        Fear.append(row['词语'])\n",
    "    if row['情感分类'] in ['NE', 'ND', 'NN', 'NK', 'NL']:\n",
    "        Disgust.append(row['词语'])\n",
    "Positive = Happy + Good +Surprise\n",
    "Negative = Anger + Sad + Fear + Disgust\n",
    "print('情绪词语列表整理完成') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:38.378444Z",
     "start_time": "2020-10-29T09:34:38.333805Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "length      25\n",
       "positive     0\n",
       "negative     4\n",
       "anger        0\n",
       "disgust      4\n",
       "fear         0\n",
       "sadness      0\n",
       "surprise     0\n",
       "good         0\n",
       "happy        0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import jieba\n",
    "import time\n",
    "def emotion_caculate(text):\n",
    "    positive = 0\n",
    "    negative = 0\n",
    "    anger = 0\n",
    "    disgust = 0\n",
    "    fear = 0\n",
    "    sad = 0\n",
    "    surprise = 0\n",
    "    good = 0\n",
    "    happy = 0\n",
    "    wordlist = jieba.lcut(text)\n",
    "    wordset = set(wordlist)\n",
    "    wordfreq = []\n",
    "    for word in wordset:\n",
    "        freq = wordlist.count(word)\n",
    "        if word in Positive:\n",
    "            positive+=freq\n",
    "        if word in Negative:\n",
    "            negative+=freq\n",
    "        if word in Anger:\n",
    "            anger+=freq\n",
    "        if word in Disgust:\n",
    "            disgust+=freq\n",
    "        if word in Fear:\n",
    "            fear+=freq\n",
    "        if word in Sad:\n",
    "            sad+=freq\n",
    "        if word in Surprise:\n",
    "            surprise+=freq\n",
    "        if word in Good:\n",
    "            good+=freq\n",
    "        if word in Happy:\n",
    "            happy+=freq\n",
    "    emotion_info = {\n",
    "        'length':len(wordlist),\n",
    "        'positive': positive,\n",
    "        'negative': negative,\n",
    "        'anger': anger,\n",
    "        'disgust': disgust,\n",
    "        'fear':fear,\n",
    "        'good':good,\n",
    "        'sadness':sad,\n",
    "        'surprise':surprise,\n",
    "        'happy':happy,\n",
    "    }\n",
    "    indexs = ['length', 'positive', 'negative', 'anger', 'disgust','fear','sadness','surprise', 'good', 'happy']\n",
    "    return pd.Series(emotion_info, index=indexs)\n",
    "\n",
    "emotion_caculate(text='这个国家再对这些制造假冒伪劣食品药品的人手软的话，那后果真的会相当糟糕。坐牢？从快判个死刑')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-10-29T09:34:50.378487Z",
     "start_time": "2020-10-29T09:34:50.331654Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "length      48\n",
       "positive     3\n",
       "negative     3\n",
       "anger        1\n",
       "disgust      2\n",
       "fear         0\n",
       "sadness      0\n",
       "surprise     3\n",
       "good         0\n",
       "happy        0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "emotion_caculate(text='错愕，平地一声雷怎么会这样？太让人意外了，非常愤怒呀。今天心情不好！股票又跌了让我大吃一惊。，损失惨重，和女朋友也分手了！非常生气，我非常郁闷！！！！')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "https://blog.csdn.net/weixin_38008864/article/details/103900840"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
